A Framework for Authorial Clustering of Shorter Texts in Latent Semantic Spaces

نویسندگان

چکیده

Authorial clustering involves the grouping of documents written by same author or team authors without any prior positive examples an author’s writing style thematic preferences. For authorial on shorter texts (paragraph-length that are typically than conventional documents), document representation is particularly important. We propose a high-level framework which utilizes compact data in latent feature space derived with non-parametric topic modeling. clusters identified thereafter two scenarios: (a) fully unsupervised and (b) semi-supervised where small number known to belong (must-link constraints) not (cannot-link constraints).We report experiments 120 collections three languages genres show topic-based provides promising level performance while reducing dimensionality factor 1500 compared state-of-the-art. also demonstrate little knowledge constraints memberships leads auspicious improvements front this difficult task.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

a framework for identifying and prioritizing factors affecting customers’ online shopping behavior in iran

the purpose of this study is identifying effective factors which make customers shop online in iran and investigating the importance of discovered factors in online customers’ decision. in the identifying phase, to discover the factors affecting online shopping behavior of customers in iran, the derived reference model summarizing antecedents of online shopping proposed by change et al. was us...

15 صفحه اول

Latent Semantic Space for Web Clustering

To organize a huge amount of Web pages into topics, according to their relevance, is the efficient and effective method for information retrieval. Latent Semantic Space (LSS) naturally in the form on some geometric structure in Combinatorial Topology has been proposed for unstructured document clustering. Given a set of Web pages, the set of associations among frequently co-occurring terms in t...

متن کامل

the role of semantic and communicative translation on reading comprehension of scientific texts

the following null hypothesis was proposed: h : there is no significant difference between the use of semantically or communicatively translates scientific texts. to test the null hypothesis, a number of procedures were taken first, two passages were selected form soyrcebooks of food and nutrition industry and gardening deciplines. each, in turn, was following by a number of comprehension quest...

15 صفحه اول

Fuzzy clustering of semantic spaces

In this paper the GK` model for the construction of thesaurus classes based on fuzzy semantic association measure between index terms and concepts (thesaurus classes) is presented. The association measure is obtained on the basis of fuzzy semantic relations between

متن کامل

Double Clustering in Latent Semantic Indexing

Document clustering is a widely researched area of information retrieval. The large amount of documents which must be handled needs automatic organizing. A popular approach to clustering documents and messages is the vector space model, which represents texts with feature vectors, usually generated from the set of terms contained in the message. The clustering based on the document-term frequen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-74251-5_24